Optimizing Retail Store Locations Based on Pedestrian Traffic Patterns
Authored by: Kushani Imanthi Ranasinghe
Duration: 90 mins
Level: Intermediate
Pre-requisite Skills: Python, Jupyter Notebook, Data Wrangling, Geospatial Analysis, Exploratory Data Analysis and Data Visualization
Scenario

As a business analyst in Melbourne, I want to analyze how factors like pedestrian traffic patterns, business density, and cafe/restaurant activity influence retail store performance, so that I can identify optimal store locations, maximize customer engagement, and support data-driven decision-making for retail businesses.

What this use case will teach you

At the end of this use case, you will:

  • Data Wrangling and Preprocessing: Gain expertise in handling multiple datasets, including pedestrian traffic data, business density data, and café/restaurant activity data, cleaning and preparing them for analysis.
  • Exploratory Data Analysis (EDA): Learn to visualize and analyze spatial and temporal trends in pedestrian traffic, business distributions, and associated activities to uncover patterns and insights.
  • Geospatial Analysis: Develop skills in analyzing location-based data, including mapping pedestrian hotspots and correlating them with business densities.
  • Data Integration: Master the integration of diverse datasets, such as combining pedestrian traffic counts with business activity and nearby amenities data, for comprehensive insights.
  • Predictive Modeling: Build and evaluate models to forecast high-potential areas for retail store success based on pedestrian patterns, business activity, and complementary services.
  • Visualization and Reporting: Create interactive maps and dashboards to effectively communicate insights and recommendations to business stakeholders.
  • Domain Knowledge in Business & Activity: Understand the relationship between urban activity, pedestrian dynamics, and retail performance, enabling data-driven decision-making for strategic retail planning.
Exploring the Influence of Pedestrian Traffic Patterns on Retail Store Success in Melbourne

Urban retail landscapes are becoming increasingly competitive as cities grow, requiring businesses to make data-driven decisions to thrive. Understanding the factors that influence retail success, such as pedestrian movement patterns and business density, is crucial for identifying high-potential locations. This use case focuses on analyzing these factors to help businesses strategically position their stores for maximum visibility and engagement.

The analysis leverages two primary datasets: the Pedestrian Counting System (counts per hour) and Business Establishments Location and Industry Classification data. These datasets, sourced from Melbourne's open data portal, provide insights into pedestrian traffic dynamics and existing business density. By integrating this data, the use case aims to uncover actionable insights into high-traffic retail zones, helping businesses align their strategies with urban dynamics and enhance their competitive advantage.

This analysis supports Melbourne’s economic vitality by promoting smarter urban planning and enabling businesses to align with the city’s activity patterns for long-term success.

Datasets used
  • Pedestrian Counting System (Counts Per Hour)

This dataset records hourly pedestrian counts across various locations in Melbourne, providing valuable insights into foot traffic patterns. It helps identify areas with high pedestrian activity and peak times, critical for analyzing potential retail store locations. The dataset is sourced from the Melbourne Open Data website and can be accessed via API V2.1.

  • Business Establishments Location and Industry Classification

This dataset details the locations and industry classifications of businesses in Melbourne, offering insights into existing retail density and types of businesses operating in specific areas. It aids in understanding competitive landscapes and complementary activities around high-potential zones. This dataset is also sourced from the Melbourne Open Data website using API V2.1.

Required Libraries and Packages¶

This section imports essential libraries for data manipulation, visualization, geospatial analysis, interactive mapping, and fetching data from APIs. These libraries provide the necessary functionality for processing, analyzing, and visualizing the project data effectively.

In [52]:
# Basic Libraries
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Geospatial Analysis
import geopandas as gpd
from shapely.geometry import Point

# Interactive Maps
import folium

# Data Fetching and Processing
import requests
from io import StringIO
from io import BytesIO

Loading the datasets using API 2.1v¶

This section defines functions for fetching data from APIs. The API_Unlimited function retrieves datasets from the Melbourne Open Data Portal using dataset IDs, processes the data into a DataFrame, and provides a preview for verification. Similarly, the fetch_data_from_url function fetches data directly from a given URL, processes it into a DataFrame, and displays a sample for validation. These functions enable seamless access to external datasets for analysis.

In [54]:
#Function to collect data
def API_Unlimited(datasetname): # pass in dataset name and api key
    dataset_id = datasetname

    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    #apikey = api_key
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC'
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        datasetname = pd.read_csv(StringIO(url_content), delimiter=';')
        print(datasetname.sample(10, random_state=999)) # Test
        return datasetname
    else:
        return (print(f'Request failed with status code {response.status_code}'))


# Function to fetch data from a URL
def fetch_data_from_url(url):
    response = requests.get(url)
    if response.status_code == 200:
        data = pd.read_csv(StringIO(response.content.decode('utf-8')))
        print(data.head())  # Display the first few rows for verification
        return data
    else:
        print(f"Failed to fetch data: {response.status_code}")
        return None

Fetching and Previewing Datasets¶

This section defines the dataset IDs required for the use case and fetches the corresponding data using the API_Unlimited function. The datasets include pedestrian traffic counts and business establishment details, which are essential for analyzing traffic patterns and business density. After retrieval, the code displays the first few rows of each dataset to confirm successful loading and ensure data integrity.

In [56]:
# Define dataset IDs for my use case
pedestrian_traffic_dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'
business_establishments_dataset_id = 'business-establishments-with-address-and-industry-classification'

# Fetch datasets
pedestrian_traffic_data = API_Unlimited(pedestrian_traffic_dataset_id)
business_establishments_data = API_Unlimited(business_establishments_dataset_id)

# Display the fetched datasets
print("\nPedestrian Traffic Dataset:")
print(pedestrian_traffic_data.head())

print("\nBusiness Establishments Dataset:")
print(business_establishments_data.head())
                    id  location_id sensing_date  hourday  direction_1  \
830362     28820241006           28   2024-10-06        8          140   
472623   1401820240303          140   2024-03-03       18           81   
378922    311620240413           31   2024-04-13       16          198   
1244588    48420220404           48   2022-04-04        4            3   
567514    611020240422           61   2024-04-22       10          330   
1086213   232220220108           23   2022-01-08       22           33   
1599708   502320240125           50   2024-01-25       23           58   
1209273    85620240105           85   2024-01-05        6           12   
1141266    18120230220           18   2023-02-20        1            1   
1355709   431020210818           43   2021-08-18       10            9   

         direction_2  pedestriancount sensor_name                    location  
830362            72              212       VAC_T  -37.82129925, 144.96879309  
472623           106              187  Boyd2837_T  -37.82590962, 144.96185972  
378922           206              404    Lyg161_T  -37.80169681, 144.96658911  
1244588            4                7      QVMQ_T  -37.80631581, 144.95866697  
567514           469              799    RMIT14_T  -37.80767455, 144.96309114  
1086213           48               81    Col623_T  -37.81909256, 144.95452748  
1599708           21               79    Lyg309_T  -37.79808192, 144.96721013  
1209273            7               19    488Mac_T  -37.79432415, 144.92973378  
1141266            4                5     Col12_T  -37.81344862, 144.97305353  
1355709           12               21       UM2_T  -37.79844526, 144.96411782  
        census_year  block_id  property_id  base_property_id  clue_small_area  \
356160         2020       752       110734            110733        Southbank   
204116         2004       861       102500            102500      South Yarra   
41994          2005        27       102067            102067  Melbourne (CBD)   
11828          2017      1109       573327            573327        Docklands   
135276         2008        35       102143            102143  Melbourne (CBD)   
129888         2009        12       110091            110091  Melbourne (CBD)   
133276         2009       361       573297            573297  North Melbourne   
344163         2018       310       628739            628739  North Melbourne   
110883         2022      1101       110843            110843        Docklands   
27432          2006       247       106235            106235          Carlton   

                                trading_name  \
356160                 Sake Restaurant & Bar   
204116       Steen & Tan Proprietary Limited   
41994   Anthony Squires Fine Quality Clothes   
11828                 New Quay Asian Grocery   
135276                     Rivette And Blair   
129888                         Flight Centre   
133276                         Easli Pty Ltd   
344163                   Swim Communications   
110883                           Coffee Rush   
27432                Poppy's Thai Restaurant   

                                         business_address  \
356160          Part 100 St Kilda Road SOUTHBANK VIC 3006   
204116                   153 Domain Road SOUTH YARRA 3141   
41994           Shop 8, 101 Collins Street MELBOURNE 3000   
11828                   15-17 Caravel Lane DOCKLANDS 3008   
135276  Shop 210, Ground , 260 Collins Street MELBOURN...   
129888           Shop 9, 15 William Street MELBOURNE 3000   
133276  Unit 30, Ground , 1 O'Connell Street NORTH MEL...   
344163  Ground , 134 Langford Street NORTH MELBOURNE 3051   
110883  Shop 310, Ground 229 Spencer Street DOCKLANDS ...   
27432                       230 Lygon Street CARLTON 3053   

        industry_anzsic4_code  \
356160                   4511   
204116                   6921   
41994                    4254   
11828                    4110   
135276                   4255   
129888                   7220   
133276                   7311   
344163                   7000   
110883                   4511   
27432                    4511   

                           industry_anzsic4_description   longitude  \
356160                            Cafes and Restaurants  144.968170   
204116                           Architectural Services  144.981278   
41994                           Mens Clothing Retailing  144.970782   
11828                    Supermarket and Grocery Stores  144.940675   
135276                        Womens Clothing Retailing  144.964909   
129888      Travel Agency and Tour Arrangement Services  144.959198   
133276  Building and Other Industrial Cleaning Services  144.957973   
344163      Computer System Design and Related Services  144.936880   
110883                            Cafes and Restaurants  144.950564   
27432                             Cafes and Restaurants  144.967220   

         latitude                                   point  
356160 -37.820942  -37.82094204031448, 144.96817034075002  
204116 -37.834261     -37.8342614498216, 144.981277576097  
41994  -37.814894    -37.8148943012917, 144.9707823311265  
11828  -37.814922      -37.8149216286, 144.94067463890494  
135276 -37.815414    -37.815413696488875, 144.96490915325  
129888 -37.819294      -37.81929390638578, 144.9591980659  
133276 -37.805723       -37.8057225963, 144.9579731919805  
344163 -37.796126   -37.7961257162923, 144.93687974609998  
110883 -37.814509     -37.814508973578526, 144.9505641426  
27432  -37.801805  -37.80180542035775, 144.96721958448984  

Pedestrian Traffic Dataset:
             id  location_id sensing_date  hourday  direction_1  direction_2  \
0  522020231213           52   2023-12-13       20          368          176   
1  231820230711           23   2023-07-11       18          158          310   
2   72220230817           72   2023-08-17        2            2            4   
3  581120241106           58   2024-11-06       11          857          282   
4   23620220731           23   2022-07-31        6            8           27   

   pedestriancount sensor_name                    location  
0              544    Eli263_T   -37.81252157, 144.9619401  
1              468    Col623_T  -37.81909256, 144.95452748  
2                6      ACMI_T  -37.81726338, 144.96872809  
3             1139    Bou688_T  -37.81686075, 144.95358075  
4               35    Col623_T  -37.81909256, 144.95452748  

Business Establishments Dataset:
   census_year  block_id  property_id  base_property_id  clue_small_area  \
0         2021        27       103596            103596  Melbourne (CBD)   
1         2021        27       103968            103968  Melbourne (CBD)   
2         2021        27       103968            103968  Melbourne (CBD)   
3         2021        27       103968            103968  Melbourne (CBD)   
4         2021        27       103968            103968  Melbourne (CBD)   

                                trading_name  \
0  TMF Corporate Services (Aust) Pty Limited   
1                         Arena Reit Limited   
2                          Taxbanter Pty Ltd   
3                                 Pask Group   
4             Webb Martin Consulting Pty Ltd   

                                    business_address  industry_anzsic4_code  \
0  Part Level 9 63 Exhibition Street MELBOURNE VI...                   6932   
1  Suite 5, Level 5 41 Exhibition Street MELBOURN...                   6720   
2  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   
3  Part Suite 15, Level 15 41 Exhibition Street M...                   3011   
4  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   

  industry_anzsic4_description   longitude   latitude  \
0          Accounting Services  144.971304 -37.814602   
1         Real Estate Services  144.971575 -37.815017   
2          Accounting Services  144.971575 -37.815017   
3           House Construction  144.971575 -37.815017   
4          Accounting Services  144.971575 -37.815017   

                                point  
0    -37.814602312, 144.9713042703283  
1  -37.81501688045, 144.9715754974218  
2  -37.81501688045, 144.9715754974218  
3  -37.81501688045, 144.9715754974218  
4  -37.81501688045, 144.9715754974218  

Displaying Dataset Overview¶

This part of the code verifies the datasets by displaying their dimensions and a preview of the first few rows. It ensures that the pedestrian traffic and business establishments data have been successfully loaded and are ready for analysis.

In [58]:
# Retrieve and display the "Pedestrian Traffic" dataset
print(f'The shape of the Pedestrian Traffic dataset is {pedestrian_traffic_data.shape}.')
print('Below are the first few rows of this dataset:')

print("Pedestrian Traffic Dataset:")
print(pedestrian_traffic_data.head())

# --------------------

# Retrieve and display the "Business Establishments" dataset
print(f'The shape of the Business Establishments dataset is {business_establishments_data.shape}.')
print('Below are the first few rows of this dataset:')

print("Business Establishments Dataset:")
print(business_establishments_data.head())
The shape of the Pedestrian Traffic dataset is (2025362, 9).
Below are the first few rows of this dataset:
Pedestrian Traffic Dataset:
             id  location_id sensing_date  hourday  direction_1  direction_2  \
0  522020231213           52   2023-12-13       20          368          176   
1  231820230711           23   2023-07-11       18          158          310   
2   72220230817           72   2023-08-17        2            2            4   
3  581120241106           58   2024-11-06       11          857          282   
4   23620220731           23   2022-07-31        6            8           27   

   pedestriancount sensor_name                    location  
0              544    Eli263_T   -37.81252157, 144.9619401  
1              468    Col623_T  -37.81909256, 144.95452748  
2                6      ACMI_T  -37.81726338, 144.96872809  
3             1139    Bou688_T  -37.81686075, 144.95358075  
4               35    Col623_T  -37.81909256, 144.95452748  
The shape of the Business Establishments dataset is (374210, 12).
Below are the first few rows of this dataset:
Business Establishments Dataset:
   census_year  block_id  property_id  base_property_id  clue_small_area  \
0         2021        27       103596            103596  Melbourne (CBD)   
1         2021        27       103968            103968  Melbourne (CBD)   
2         2021        27       103968            103968  Melbourne (CBD)   
3         2021        27       103968            103968  Melbourne (CBD)   
4         2021        27       103968            103968  Melbourne (CBD)   

                                trading_name  \
0  TMF Corporate Services (Aust) Pty Limited   
1                         Arena Reit Limited   
2                          Taxbanter Pty Ltd   
3                                 Pask Group   
4             Webb Martin Consulting Pty Ltd   

                                    business_address  industry_anzsic4_code  \
0  Part Level 9 63 Exhibition Street MELBOURNE VI...                   6932   
1  Suite 5, Level 5 41 Exhibition Street MELBOURN...                   6720   
2  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   
3  Part Suite 15, Level 15 41 Exhibition Street M...                   3011   
4  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   

  industry_anzsic4_description   longitude   latitude  \
0          Accounting Services  144.971304 -37.814602   
1         Real Estate Services  144.971575 -37.815017   
2          Accounting Services  144.971575 -37.815017   
3           House Construction  144.971575 -37.815017   
4          Accounting Services  144.971575 -37.815017   

                                point  
0    -37.814602312, 144.9713042703283  
1  -37.81501688045, 144.9715754974218  
2  -37.81501688045, 144.9715754974218  
3  -37.81501688045, 144.9715754974218  
4  -37.81501688045, 144.9715754974218  

Data Cleaning and Preparation¶

- Check for Missing and Duplicate Values¶

This section performs a data quality check by identifying missing values and duplicate rows in the pedestrian traffic and business establishments datasets. This helps ensure the data is clean and ready for further analysis by highlighting potential issues that need to be addressed.

In [60]:
# Check for missing values in pedestrian traffic dataset
print("Missing values in Pedestrian Traffic Dataset:")
print(pedestrian_traffic_data.isnull().sum())

# Check for missing values in business establishments dataset
print("\nMissing values in Business Establishments Dataset:")
print(business_establishments_data.isnull().sum())

# Check for duplicate rows in pedestrian traffic dataset
print(f"\nDuplicate rows in Pedestrian Traffic Dataset: {pedestrian_traffic_data.duplicated().sum()}")

# Check for duplicate rows in business establishments dataset
print(f"Duplicate rows in Business Establishments Dataset: {business_establishments_data.duplicated().sum()}")
Missing values in Pedestrian Traffic Dataset:
id                 0
location_id        0
sensing_date       0
hourday            0
direction_1        0
direction_2        0
pedestriancount    0
sensor_name        0
location           0
dtype: int64

Missing values in Business Establishments Dataset:
census_year                        0
block_id                           0
property_id                        0
base_property_id                   0
clue_small_area                    0
trading_name                     127
business_address                   1
industry_anzsic4_code              0
industry_anzsic4_description       0
longitude                       4785
latitude                        4785
point                           4785
dtype: int64

Duplicate rows in Pedestrian Traffic Dataset: 0
Duplicate rows in Business Establishments Dataset: 0

- Handle Missing and Duplicate Data¶

This section addresses missing values in the business establishments dataset. Categorical columns like trading_name and business_address are filled with their most frequent values (mode), while numerical columns such as longitude and latitude are filled with their mean values. Additionally, the redundant point column is removed to streamline the dataset for analysis.

In [62]:
# Handle missing values for 'trading_name' (categorical column) using the mode
business_establishments_data['trading_name'] = business_establishments_data['trading_name'].fillna(
    business_establishments_data['trading_name'].mode()[0]
)

# Handle missing values for 'business_address' (categorical column) using the mode
business_establishments_data['business_address'] = business_establishments_data['business_address'].fillna(
    business_establishments_data['business_address'].mode()[0]
)

# Handle missing values for 'longitude' and 'latitude' (numerical columns) using the mean
business_establishments_data['longitude'] = business_establishments_data['longitude'].fillna(
    business_establishments_data['longitude'].mean()
)
business_establishments_data['latitude'] = business_establishments_data['latitude'].fillna(
    business_establishments_data['latitude'].mean()
)

# Drop the 'point' column since it's redundant
business_establishments_data = business_establishments_data.drop(columns=['point'])

- Convert Data Types¶

This part ensures the data is in the correct format for analysis. The sensing_date column is converted to datetime format, and numerical columns like pedestriancount, longitude, and latitude are converted to numeric types. Data types are then verified to confirm the transformations were successful.

In [64]:
# Convert 'date' column to datetime
pedestrian_traffic_data['sensing_date'] = pd.to_datetime(pedestrian_traffic_data['sensing_date'], errors='coerce')

# Ensure numerical columns are in the correct format
pedestrian_traffic_data['pedestriancount'] = pd.to_numeric(pedestrian_traffic_data['pedestriancount'], errors='coerce')

business_establishments_data['longitude'] = pd.to_numeric(business_establishments_data['longitude'], errors='coerce')
business_establishments_data['latitude'] = pd.to_numeric(business_establishments_data['latitude'], errors='coerce')

# Verify data types
print("\nPedestrian Traffic Dataset Data Types:")
print(pedestrian_traffic_data.dtypes)

print("\nBusiness Establishments Dataset Data Types:")
print(business_establishments_data.dtypes)
Pedestrian Traffic Dataset Data Types:
id                          int64
location_id                 int64
sensing_date       datetime64[ns]
hourday                     int64
direction_1                 int64
direction_2                 int64
pedestriancount             int64
sensor_name                object
location                   object
dtype: object

Business Establishments Dataset Data Types:
census_year                       int64
block_id                          int64
property_id                       int64
base_property_id                  int64
clue_small_area                  object
trading_name                     object
business_address                 object
industry_anzsic4_code             int64
industry_anzsic4_description     object
longitude                       float64
latitude                        float64
dtype: object

- Verify Data Cleaning¶

This part of the code verifies the success of the data cleaning process by displaying the first few rows of the pedestrian traffic and business establishments datasets. This ensures that missing values, data type corrections, and other preprocessing steps were applied correctly, preparing the data for further analysis.

In [66]:
# Verify the cleaned datasets
print("Cleaned Pedestrian Traffic Dataset:")
print(pedestrian_traffic_data.head())

print("\nCleaned Business Establishments Dataset:")
print(business_establishments_data.head())
Cleaned Pedestrian Traffic Dataset:
             id  location_id sensing_date  hourday  direction_1  direction_2  \
0  522020231213           52   2023-12-13       20          368          176   
1  231820230711           23   2023-07-11       18          158          310   
2   72220230817           72   2023-08-17        2            2            4   
3  581120241106           58   2024-11-06       11          857          282   
4   23620220731           23   2022-07-31        6            8           27   

   pedestriancount sensor_name                    location  
0              544    Eli263_T   -37.81252157, 144.9619401  
1              468    Col623_T  -37.81909256, 144.95452748  
2                6      ACMI_T  -37.81726338, 144.96872809  
3             1139    Bou688_T  -37.81686075, 144.95358075  
4               35    Col623_T  -37.81909256, 144.95452748  

Cleaned Business Establishments Dataset:
   census_year  block_id  property_id  base_property_id  clue_small_area  \
0         2021        27       103596            103596  Melbourne (CBD)   
1         2021        27       103968            103968  Melbourne (CBD)   
2         2021        27       103968            103968  Melbourne (CBD)   
3         2021        27       103968            103968  Melbourne (CBD)   
4         2021        27       103968            103968  Melbourne (CBD)   

                                trading_name  \
0  TMF Corporate Services (Aust) Pty Limited   
1                         Arena Reit Limited   
2                          Taxbanter Pty Ltd   
3                                 Pask Group   
4             Webb Martin Consulting Pty Ltd   

                                    business_address  industry_anzsic4_code  \
0  Part Level 9 63 Exhibition Street MELBOURNE VI...                   6932   
1  Suite 5, Level 5 41 Exhibition Street MELBOURN...                   6720   
2  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   
3  Part Suite 15, Level 15 41 Exhibition Street M...                   3011   
4  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   

  industry_anzsic4_description   longitude   latitude  
0          Accounting Services  144.971304 -37.814602  
1         Real Estate Services  144.971575 -37.815017  
2          Accounting Services  144.971575 -37.815017  
3           House Construction  144.971575 -37.815017  
4          Accounting Services  144.971575 -37.815017  

- Sensor Locations Dataset Fetching¶

This section retrieves the pedestrian sensor locations dataset, which provides geospatial details of sensor placements, using the API_Unlimited function. A sample of the data is displayed to verify successful fetching and ensure it is ready for integration with other datasets.

In [68]:
# Fetch the Pedestrian Sensor Locations dataset
sensor_locations_dataset_id = 'pedestrian-counting-system-sensor-locations'
sensor_locations = API_Unlimited(sensor_locations_dataset_id)

# Verify the fetched data
print("\nSensor Locations Dataset Sample:")
print(sensor_locations.head())
     location_id                                 sensor_description  \
61            65                          Swanston St - City Square   
93            17                              Collins Place (South)   
29            87                                    Errol St (West)   
126          137     COM Pole 2353 - Towards the city, NAB Building   
0              2                         Bourke Street Mall (South)   
24            72                                  Flinders St- ACMI   
18            43                       Monash Rd-Swanston St (West)   
125          131  I-Hub Corner of King Street and Flinders Stree...   
53            41                     Flinders La-Swanston St (West)   
140          166                                 484 Spencer Street   

      sensor_name installation_date  \
61        SwaCs_T        2020-03-12   
93        Col15_T        2009-03-30   
29      Errol23_T        2022-05-20   
126  BouHbr2353_T        2023-11-03   
0        Bou283_T        2009-03-30   
24         ACMI_T        2020-11-30   
18          UM2_T        2015-04-15   
125       King2_T        2023-09-25   
53          Swa31        2017-06-29   
140     Spen484_T        2024-09-25   

                                           note location_type status  \
61                                          NaN       Outdoor      A   
93             Device is upgraded in 26/02/2020       Outdoor      A   
29                                          NaN       Outdoor      A   
126                                         NaN       Outdoor      A   
0                                           NaN       Outdoor      A   
24                                          NaN       Outdoor      A   
18                                          NaN       Outdoor      A   
125                                         NaN       Outdoor      A   
53                                          NaN       Outdoor      A   
140  Former sensor 227 Bourke Street – City Lab       Outdoor      A   

    direction_1 direction_2   latitude   longitude                    location  
61        North       South -37.815694  144.966806   -37.81569416, 144.9668064  
93         East        West -37.813625  144.973236  -37.81362543, 144.97323591  
29        North       South -37.804549  144.949219  -37.80454949, 144.94921863  
126        East        West -37.818948  144.946123  -37.81894815, 144.94612292  
0          East        West -37.813807  144.965167  -37.81380668, 144.96516718  
24         East        West -37.817263  144.968728  -37.81726338, 144.96872809  
18        North       South -37.798445  144.964118  -37.79844526, 144.96411782  
125       North       South -37.820091  144.957587  -37.82009057, 144.95758725  
53        North       South -37.816686  144.966897  -37.81668634, 144.96689733  
140       North       South -37.808967  144.949317  -37.80896733, 144.94931703  

Sensor Locations Dataset Sample:
   location_id                 sensor_description sensor_name  \
0            2         Bourke Street Mall (South)    Bou283_T   
1            4                   Town Hall (West)    Swa123_T   
2            6  Flinders Street Station Underpass      FliS_T   
3            8                        Webb Bridge     WebBN_T   
4           10                     Victoria Point    BouHbr_T   

  installation_date                 note location_type status direction_1  \
0        2009-03-30                  NaN       Outdoor      A        East   
1        2009-03-23                  NaN       Outdoor      A       North   
2        2009-03-25  Upgraded on 8/09/21       Outdoor      A       North   
3        2009-03-24                  NaN       Outdoor      A       North   
4        2009-04-23                  NaN       Outdoor      A        East   

  direction_2   latitude   longitude                    location  
0        West -37.813807  144.965167  -37.81380668, 144.96516718  
1       South -37.814880  144.966088   -37.81487988, 144.9660878  
2       South -37.819117  144.965583  -37.81911705, 144.96558255  
3       South -37.822935  144.947175   -37.82293543, 144.9471751  
4        West -37.818765  144.947105  -37.81876474, 144.94710545  

- Cleaning and Validating Sensor Locations Dataset¶

This section checks the sensor locations dataset for missing values and addresses any issues by dropping rows with missing longitude or latitude. It also ensures these columns are converted to numeric types for geospatial analysis. Finally, a sample of the cleaned data is displayed to confirm that the dataset is ready for further use.

In [70]:
# Check for missing values and data types
print("\nMissing values in Sensor Locations Dataset:")
print(sensor_locations.isnull().sum())

# Drop rows with missing longitude/latitude
sensor_locations = sensor_locations.dropna(subset=['longitude', 'latitude'])

# Ensure longitude and latitude are numeric
sensor_locations['longitude'] = pd.to_numeric(sensor_locations['longitude'], errors='coerce')
sensor_locations['latitude'] = pd.to_numeric(sensor_locations['latitude'], errors='coerce')

# Verify cleaned data
print("\nCleaned Sensor Locations Dataset Sample:")
print(sensor_locations.head())
Missing values in Sensor Locations Dataset:
location_id             0
sensor_description      2
sensor_name             0
installation_date       2
note                  108
location_type           0
status                  0
direction_1            32
direction_2            32
latitude                0
longitude               0
location                0
dtype: int64

Cleaned Sensor Locations Dataset Sample:
   location_id                 sensor_description sensor_name  \
0            2         Bourke Street Mall (South)    Bou283_T   
1            4                   Town Hall (West)    Swa123_T   
2            6  Flinders Street Station Underpass      FliS_T   
3            8                        Webb Bridge     WebBN_T   
4           10                     Victoria Point    BouHbr_T   

  installation_date                 note location_type status direction_1  \
0        2009-03-30                  NaN       Outdoor      A        East   
1        2009-03-23                  NaN       Outdoor      A       North   
2        2009-03-25  Upgraded on 8/09/21       Outdoor      A       North   
3        2009-03-24                  NaN       Outdoor      A       North   
4        2009-04-23                  NaN       Outdoor      A        East   

  direction_2   latitude   longitude                    location  
0        West -37.813807  144.965167  -37.81380668, 144.96516718  
1       South -37.814880  144.966088   -37.81487988, 144.9660878  
2       South -37.819117  144.965583  -37.81911705, 144.96558255  
3       South -37.822935  144.947175   -37.82293543, 144.9471751  
4        West -37.818765  144.947105  -37.81876474, 144.94710545  

- Merging Pedestrian Traffic Data with Sensor Locations¶

This section merges the pedestrian traffic data with the sensor locations dataset using location_id as the common key. A left join is performed to ensure all rows from the pedestrian traffic data are retained, while adding the corresponding longitude and latitude values from the sensor locations. A sample of the merged dataset is displayed to confirm the successful integration of geospatial information.

In [72]:
# Merge pedestrian traffic data with sensor locations
pedestrian_traffic_data = pedestrian_traffic_data.merge(
    sensor_locations[['location_id', 'longitude', 'latitude']],
    on='location_id',  # Use 'location_id' as the common key
    how='left'         # Left join to retain all rows from pedestrian_traffic_data
)

# Verify the merged dataset
print("\nPedestrian Traffic Dataset with Geospatial Data:")
print(pedestrian_traffic_data.head())
Pedestrian Traffic Dataset with Geospatial Data:
             id  location_id sensing_date  hourday  direction_1  direction_2  \
0  522020231213           52   2023-12-13       20          368          176   
1  231820230711           23   2023-07-11       18          158          310   
2   72220230817           72   2023-08-17        2            2            4   
3  581120241106           58   2024-11-06       11          857          282   
4   23620220731           23   2022-07-31        6            8           27   

   pedestriancount sensor_name                    location   longitude  \
0              544    Eli263_T   -37.81252157, 144.9619401  144.961940   
1              468    Col623_T  -37.81909256, 144.95452748  144.954527   
2                6      ACMI_T  -37.81726338, 144.96872809  144.968728   
3             1139    Bou688_T  -37.81686075, 144.95358075  144.953581   
4               35    Col623_T  -37.81909256, 144.95452748  144.954527   

    latitude  
0 -37.812522  
1 -37.819093  
2 -37.817263  
3 -37.816861  
4 -37.819093  

- Checking for Missing Geospatial Data After Merge¶

This section checks for any rows in the pedestrian traffic dataset that are missing geospatial data (longitude and latitude) after the merge with the sensor locations. The count of such rows is displayed to identify any remaining gaps in geospatial information that may need to be addressed.

In [74]:
# Check for missing longitude and latitude after merge
missing_geo = pedestrian_traffic_data[pedestrian_traffic_data['longitude'].isnull()]
print(f"\nRows with missing geospatial data: {missing_geo.shape[0]}")
Rows with missing geospatial data: 0

- Add Geometric Data for Spatial Analysis¶

This section creates a geometry column for both the pedestrian traffic and business establishments datasets. Each row's longitude and latitude are converted into Point objects using the shapely.geometry.Point class. These geometry columns are essential for geospatial analysis, enabling mapping and spatial operations.

In [76]:
from shapely.geometry import Point

# Create geometry column for pedestrian traffic data
pedestrian_traffic_data['geometry'] = pedestrian_traffic_data.apply(
    lambda row: Point(float(row['longitude']), float(row['latitude'])), axis=1
)

# Create geometry column for business establishments data
business_establishments_data['geometry'] = business_establishments_data.apply(
    lambda row: Point(float(row['longitude']), float(row['latitude'])), axis=1
)

- Prepare GeoDataFrames¶

This section converts the pedestrian traffic and business establishments datasets into GeoDataFrames using geopandas. The geometry column created earlier is used to define spatial features, and the coordinate reference system (CRS) is set to EPSG:4326 for geographic coordinates (latitude and longitude). The resulting GeoDataFrames are then verified by displaying sample rows to confirm the data structure is suitable for geospatial analysis.

In [78]:
# Convert pedestrian traffic data to a GeoDataFrame
pedestrian_gdf = gpd.GeoDataFrame(pedestrian_traffic_data, geometry='geometry', crs="EPSG:4326")

# Convert business establishments data to a GeoDataFrame
business_gdf = gpd.GeoDataFrame(business_establishments_data, geometry='geometry', crs="EPSG:4326")

# Verify GeoDataFrame structures
print("\nPedestrian GeoDataFrame Sample:")
print(pedestrian_gdf.head())

print("\nBusiness GeoDataFrame Sample:")
print(business_gdf.head())
Pedestrian GeoDataFrame Sample:
             id  location_id sensing_date  hourday  direction_1  direction_2  \
0  522020231213           52   2023-12-13       20          368          176   
1  231820230711           23   2023-07-11       18          158          310   
2   72220230817           72   2023-08-17        2            2            4   
3  581120241106           58   2024-11-06       11          857          282   
4   23620220731           23   2022-07-31        6            8           27   

   pedestriancount sensor_name                    location   longitude  \
0              544    Eli263_T   -37.81252157, 144.9619401  144.961940   
1              468    Col623_T  -37.81909256, 144.95452748  144.954527   
2                6      ACMI_T  -37.81726338, 144.96872809  144.968728   
3             1139    Bou688_T  -37.81686075, 144.95358075  144.953581   
4               35    Col623_T  -37.81909256, 144.95452748  144.954527   

    latitude                     geometry  
0 -37.812522  POINT (144.96194 -37.81252)  
1 -37.819093  POINT (144.95453 -37.81909)  
2 -37.817263  POINT (144.96873 -37.81726)  
3 -37.816861  POINT (144.95358 -37.81686)  
4 -37.819093  POINT (144.95453 -37.81909)  

Business GeoDataFrame Sample:
   census_year  block_id  property_id  base_property_id  clue_small_area  \
0         2021        27       103596            103596  Melbourne (CBD)   
1         2021        27       103968            103968  Melbourne (CBD)   
2         2021        27       103968            103968  Melbourne (CBD)   
3         2021        27       103968            103968  Melbourne (CBD)   
4         2021        27       103968            103968  Melbourne (CBD)   

                                trading_name  \
0  TMF Corporate Services (Aust) Pty Limited   
1                         Arena Reit Limited   
2                          Taxbanter Pty Ltd   
3                                 Pask Group   
4             Webb Martin Consulting Pty Ltd   

                                    business_address  industry_anzsic4_code  \
0  Part Level 9 63 Exhibition Street MELBOURNE VI...                   6932   
1  Suite 5, Level 5 41 Exhibition Street MELBOURN...                   6720   
2  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   
3  Part Suite 15, Level 15 41 Exhibition Street M...                   3011   
4  Part Suite 9, Level 9 41 Exhibition Street MEL...                   6932   

  industry_anzsic4_description   longitude   latitude  \
0          Accounting Services  144.971304 -37.814602   
1         Real Estate Services  144.971575 -37.815017   
2          Accounting Services  144.971575 -37.815017   
3           House Construction  144.971575 -37.815017   
4          Accounting Services  144.971575 -37.815017   

                      geometry  
0    POINT (144.9713 -37.8146)  
1  POINT (144.97158 -37.81502)  
2  POINT (144.97158 -37.81502)  
3  POINT (144.97158 -37.81502)  
4  POINT (144.97158 -37.81502)  

- Save Cleaned Data¶

This section saves the cleaned pedestrian traffic and business establishments datasets as CSV files. This step ensures that the processed data is preserved for future use or further analysis. A confirmation message is displayed to indicate successful saving of the datasets.

In [80]:
# Save cleaned pedestrian traffic data to CSV
pedestrian_traffic_data.to_csv('cleaned_pedestrian_traffic_data.csv', index=False)

# Save cleaned business establishments data to CSV
business_establishments_data.to_csv('cleaned_business_establishments_data.csv', index=False)

print("Cleaned datasets saved successfully!")
Cleaned datasets saved successfully!

Data Exploration and Visualization¶

- Visualize Pedestrian Traffic Hotspots¶

This section visualizes pedestrian traffic hotspots in Melbourne by plotting the locations from the GeoDataFrame. The points are displayed in blue on a map with axes labeled for longitude and latitude. The plot provides a geographic overview of areas with recorded pedestrian activity, helping to identify high-traffic regions.

In [82]:
# Plot pedestrian traffic locations

plt.figure(figsize=(10, 8))
pedestrian_gdf.plot(ax=plt.gca(), color='blue', markersize=5, alpha=0.6, legend=True)
plt.title("Pedestrian Traffic Hotspots in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
No description has been provided for this image

- Visualize Business Establishments¶

This section creates a geographic visualization of business establishments in Melbourne by plotting their locations on a map. The points are displayed in green, representing the spatial distribution of businesses. The plot helps in understanding the density and spread of business establishments across the city.

In [84]:
# Plot business locations

plt.figure(figsize=(10, 8))
business_gdf.plot(ax=plt.gca(), color='green', markersize=5, alpha=0.6, legend=True)
plt.title("Business Establishments in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
No description has been provided for this image

Combine Pedestrian and Business Data for Proximity Analysis¶

- Verify Geometry Types¶

This section verifies the types of geometries in the GeoDataFrames for pedestrian traffic and business establishments. It counts and displays the different geometry types (e.g:- Point) to ensure that all spatial data is correctly formatted and suitable for geospatial analysis.

In [86]:
# Check geometry types in pedestrian GeoDataFrame
print(pedestrian_gdf.geom_type.value_counts())

# Check geometry types in business GeoDataFrame
print(business_gdf.geom_type.value_counts())
Point    2191890
Name: count, dtype: int64
Point    374210
Name: count, dtype: int64

- Perform Nearest Neighbor Analysis¶

This section performs a nearest neighbor analysis to link business establishments with nearby pedestrian traffic data. First, longitude and latitude are converted into Point objects, and both datasets are transformed into GeoDataFrames with valid geometries, ensuring any invalid points are removed. The coordinates of pedestrian and business locations are extracted and structured for spatial computation using cKDTree. This tree is then used to calculate the nearest pedestrian location for each business, along with the distance between them. The resulting pedestrian count and distance are added as new columns to the business GeoDataFrame, providing valuable insights into the proximity of businesses to high pedestrian traffic areas. The updated dataset is verified to ensure the analysis was successful.

In [88]:
from scipy.spatial import cKDTree

# Step 1: Re-create Geometry Columns
# Pedestrian Data
pedestrian_traffic_data['geometry'] = pedestrian_traffic_data.apply(
    lambda row: Point(row['longitude'], row['latitude']), axis=1
)

# Business Data
business_establishments_data['geometry'] = business_establishments_data.apply(
    lambda row: Point(row['longitude'], row['latitude']), axis=1
)

# Convert to GeoDataFrames
pedestrian_gdf = gpd.GeoDataFrame(pedestrian_traffic_data, geometry='geometry', crs="EPSG:4326")
business_gdf = gpd.GeoDataFrame(business_establishments_data, geometry='geometry', crs="EPSG:4326")

# Step 2: Validate and Filter Geometries
# Remove invalid geometries for pedestrian data
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.is_empty]
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.isnull()]

# Remove invalid geometries for business data
business_gdf = business_gdf[~business_gdf.geometry.is_empty]
business_gdf = business_gdf[~business_gdf.geometry.isnull()]

# Step 3: Extract Coordinates for KDTree
pedestrian_coords = np.array(list(zip(pedestrian_gdf.geometry.x, pedestrian_gdf.geometry.y)))
business_coords = np.array(list(zip(business_gdf.geometry.x, business_gdf.geometry.y)))

# Ensure there are valid coordinates
if pedestrian_coords.shape[0] == 0:
    raise ValueError("No valid pedestrian coordinates found. Check the pedestrian GeoDataFrame.")
if business_coords.shape[0] == 0:
    raise ValueError("No valid business coordinates found. Check the business GeoDataFrame.")

# Step 4: Perform Nearest Neighbor Analysis
tree = cKDTree(pedestrian_coords)
distances, indices = tree.query(business_coords, k=1)

# Step 5: Add Nearest Pedestrian Data to Business GeoDataFrame
business_gdf['nearest_pedestrian_count'] = pedestrian_gdf.iloc[indices]['pedestriancount'].values
business_gdf['distance_to_nearest_pedestrian'] = distances

# Step 6: Verify the Result
print("\nBusiness GeoDataFrame with Nearest Pedestrian Data:")
print(business_gdf[['trading_name', 'nearest_pedestrian_count', 'distance_to_nearest_pedestrian']])
Business GeoDataFrame with Nearest Pedestrian Data:
                                     trading_name  nearest_pedestrian_count  \
0       TMF Corporate Services (Aust) Pty Limited                        50   
1                              Arena Reit Limited                       241   
2                               Taxbanter Pty Ltd                       241   
3                                      Pask Group                       241   
4                  Webb Martin Consulting Pty Ltd                       241   
...                                           ...                       ...   
374205                                     Vacant                       108   
374206                           Aeon Accessories                       108   
374207                        Swim Communications                       108   
374208                                     vacant                       108   
374209            Frank Samways Veterinary Clinic                       108   

        distance_to_nearest_pedestrian  
0                             0.001569  
1                             0.001440  
2                             0.001440  
3                             0.001440  
4                             0.001440  
...                                ...  
374205                        0.006658  
374206                        0.006955  
374207                        0.006708  
374208                        0.007151  
374209                        0.008125  

[374210 rows x 3 columns]

Identify High-Traffic Business Zones¶

This section applies DBSCAN clustering to identify high-traffic business zones in Melbourne. A subset of pedestrian and business coordinates is sampled to optimize computational efficiency, and the combined data is clustered using the DBSCAN algorithm. DBSCAN groups points based on density, identifying clusters and separating noise. The clustering results are added to the sampled business GeoDataFrame, assigning a cluster label to each business. Finally, the clusters are visualized on a map, highlighting high-traffic business zones and providing insights into their spatial distribution.

In [90]:
from sklearn.cluster import DBSCAN

# Sample size for clustering
sample_size = 10000

# Sample pedestrian and business coordinates
pedestrian_sample = pedestrian_coords[
    np.random.choice(pedestrian_coords.shape[0], min(sample_size, pedestrian_coords.shape[0]), replace=False)
]
business_sample = business_coords[
    np.random.choice(business_coords.shape[0], min(sample_size, business_coords.shape[0]), replace=False)
]

# Combine sampled data
combined_sample_coords = np.concatenate([pedestrian_sample, business_sample], axis=0)

# Perform DBSCAN clustering
db = DBSCAN(eps=0.005, min_samples=5).fit(combined_sample_coords)

# Create a DataFrame for the sampled business data with clustering results
business_sample_gdf = business_gdf.iloc[:len(business_sample)].copy()
business_sample_gdf['cluster'] = db.labels_[-len(business_sample):]

# Plot clusters for the sampled business data
plt.figure(figsize=(10, 8))
business_sample_gdf.plot(column='cluster', cmap='viridis', legend=True, markersize=5, alpha=0.6)
plt.title("High-Traffic Business Zones in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
<Figure size 1000x800 with 0 Axes>
No description has been provided for this image

Identify High-Potential Retail Zones¶

This section identifies and visualizes high-potential retail zones in Melbourne based on pedestrian traffic and business density. It begins by ensuring valid geometries in the GeoDataFrames, then samples business data for clustering using the DBSCAN algorithm to group businesses into clusters. Each cluster is analyzed for its business density, which is added to the GeoDataFrame. Thresholds are set based on the 75th percentile for low business density and the 25th percentile for high pedestrian counts to identify zones with low competition and high foot traffic. These high-potential zones are verified and visualized on a map, highlighting areas suitable for new retail opportunities. The plot combines pedestrian traffic data with marked high-potential zones, offering a clear geographic representation of the findings.

In [92]:
# Ensure valid geometry in the GeoDataFrame
business_gdf = business_gdf[~business_gdf.geometry.is_empty]
business_gdf = business_gdf[~business_gdf.geometry.isna()]
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.is_empty]
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.isna()]

# Step 1: Sample business data
sample_size = 10000
business_sample = business_gdf.sample(n=min(sample_size, len(business_gdf)), random_state=42)

# Extract coordinates for clustering from the sample
business_coords_sample = np.array(list(zip(business_sample.geometry.x, business_sample.geometry.y)))

# Ensure there are valid coordinates
if len(business_coords_sample) == 0:
    raise ValueError("No valid business coordinates found in the sample. Check the business GeoDataFrame.")

# Step 2: Perform DBSCAN clustering on the sampled data
db = DBSCAN(eps=0.005, min_samples=5).fit(business_coords_sample)

# Step 3: Assign cluster labels to the sampled data
business_sample['cluster'] = db.labels_

# Step 4: Map sampled clusters back to the full dataset
cluster_mapping = dict(zip(business_sample.index, business_sample['cluster']))
business_gdf['cluster'] = business_gdf.index.map(cluster_mapping).fillna(-1).astype(int)

# Verify the cluster column
print("Unique clusters assigned:", business_gdf['cluster'].unique())

# Step 5: Calculate business density for each cluster
business_density = business_gdf.groupby('cluster').size()

# Step 6: Add business density information to the GeoDataFrame
business_gdf['business_density'] = business_gdf['cluster'].map(business_density)

# Verify the addition of business density
print("\nBusiness Density by Cluster:")
print(business_density.head())

# Step 7: Inspect Data Distributions
print("\nBusiness Density Summary:")
print(business_density.describe())

print("\nPedestrian Count Summary:")
print(pedestrian_gdf['pedestriancount'].describe())

# Step 8: Set thresholds for high-potential zones
threshold_density = business_density.quantile(0.75)  # 75th percentile for broader inclusion
pedestrian_threshold = pedestrian_gdf['pedestriancount'].quantile(0.25)  # 25th percentile for broader inclusion

# Verify thresholds
print(f"\nAdjusted Threshold for Low Business Density: {threshold_density}")
print(f"Adjusted Threshold for High Pedestrian Count: {pedestrian_threshold}")

# Step 9: Identify high-potential retail zones
high_potential_zones = business_gdf[
    (business_gdf['nearest_pedestrian_count'] > pedestrian_threshold) & 
    (business_gdf['business_density'] < threshold_density)
]

# Debugging step: Display rows that match each condition
print("\nRows matching high pedestrian count condition:")
print(business_gdf[business_gdf['nearest_pedestrian_count'] > pedestrian_threshold])

print("\nRows matching low business density condition:")
print(business_gdf[business_gdf['business_density'] < threshold_density])

# Verify high-potential zones
if high_potential_zones.empty:
    print("\nNo high-potential retail zones found. Consider further relaxing thresholds.")
else:
    print("\nHigh-Potential Retail Zones:")
    print(high_potential_zones[['trading_name', 'nearest_pedestrian_count', 'business_density']])

# Step 10: Visualize high-potential retail zones
plt.figure(figsize=(10, 8))

# Plot pedestrian traffic as base
base = pedestrian_gdf.plot(color='blue', markersize=5, alpha=0.6, label='Pedestrian Traffic')

if not high_potential_zones.empty:
    # Plot high-potential retail zones
    high_potential_zones.plot(ax=base, color='red', markersize=10, alpha=0.8, label='High-Potential Zones')

plt.title("High-Potential Retail Zones in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")

# Set aspect ratio to avoid errors
plt.gca().set_aspect('equal', adjustable='datalim')

plt.legend()
plt.show()
Unique clusters assigned: [-1  0  1  2]

Business Density by Cluster:
cluster
-1    364216
 0      9966
 1        23
 2         5
dtype: int64

Business Density Summary:
count         4.000000
mean      93552.500000
std      180503.310534
min           5.000000
25%          18.500000
50%        4994.500000
75%       98528.500000
max      364216.000000
dtype: float64

Pedestrian Count Summary:
count    2.191890e+06
mean     3.658121e+02
std      5.514674e+02
min      0.000000e+00
25%      3.900000e+01
50%      1.530000e+02
75%      4.360000e+02
max      8.895000e+03
Name: pedestriancount, dtype: float64

Adjusted Threshold for Low Business Density: 98528.5
Adjusted Threshold for High Pedestrian Count: 39.0

Rows matching high pedestrian count condition:
        census_year  block_id  property_id  base_property_id  clue_small_area  \
0              2021        27       103596            103596  Melbourne (CBD)   
1              2021        27       103968            103968  Melbourne (CBD)   
2              2021        27       103968            103968  Melbourne (CBD)   
3              2021        27       103968            103968  Melbourne (CBD)   
4              2021        27       103968            103968  Melbourne (CBD)   
...             ...       ...          ...               ...              ...   
374205         2020       309       593976            593976  North Melbourne   
374206         2020       310       628737            628737  North Melbourne   
374207         2020       310       628739            628739  North Melbourne   
374208         2020       310       628743            628743  North Melbourne   
374209         2020       311       101102            101102  North Melbourne   

                                     trading_name  \
0       TMF Corporate Services (Aust) Pty Limited   
1                              Arena Reit Limited   
2                               Taxbanter Pty Ltd   
3                                      Pask Group   
4                  Webb Martin Consulting Pty Ltd   
...                                           ...   
374205                                     Vacant   
374206                           Aeon Accessories   
374207                        Swim Communications   
374208                                     vacant   
374209            Frank Samways Veterinary Clinic   

                                         business_address  \
0       Part Level 9 63 Exhibition Street MELBOURNE VI...   
1       Suite 5, Level 5 41 Exhibition Street MELBOURN...   
2       Part Suite 9, Level 9 41 Exhibition Street MEL...   
3       Part Suite 15, Level 15 41 Exhibition Street M...   
4       Part Suite 9, Level 9 41 Exhibition Street MEL...   
...                                                   ...   
374205   138-140 Langford Street NORTH MELBOURNE VIC 3051   
374206        49-53 Steel Street NORTH MELBOURNE VIC 3051   
374207   126-134 Langford Street NORTH MELBOURNE VIC 3051   
374208      42-48 Straker Street NORTH MELBOURNE VIC 3051   
374209         1-3 Boundary Road NORTH MELBOURNE VIC 3051   

        industry_anzsic4_code                 industry_anzsic4_description  \
0                        6932                          Accounting Services   
1                        6720                         Real Estate Services   
2                        6932                          Accounting Services   
3                        3011                           House Construction   
4                        6932                          Accounting Services   
...                       ...                                          ...   
374205                      0                                 Vacant Space   
374206                   3493          Telecommunication Goods Wholesaling   
374207                   7000  Computer System Design and Related Services   
374208                      0                                 Vacant Space   
374209                   6970                          Veterinary Services   

         longitude   latitude                     geometry  \
0       144.971304 -37.814602    POINT (144.9713 -37.8146)   
1       144.971575 -37.815017  POINT (144.97158 -37.81502)   
2       144.971575 -37.815017  POINT (144.97158 -37.81502)   
3       144.971575 -37.815017  POINT (144.97158 -37.81502)   
4       144.971575 -37.815017  POINT (144.97158 -37.81502)   
...            ...        ...                          ...   
374205  144.936891 -37.795840  POINT (144.93689 -37.79584)   
374206  144.937127 -37.796156  POINT (144.93713 -37.79616)   
374207  144.936880 -37.796126  POINT (144.93688 -37.79613)   
374208  144.937226 -37.796541  POINT (144.93723 -37.79654)   
374209  144.938101 -37.797013   POINT (144.9381 -37.79701)   

        nearest_pedestrian_count  distance_to_nearest_pedestrian  cluster  \
0                             50                        0.001569       -1   
1                            241                        0.001440       -1   
2                            241                        0.001440       -1   
3                            241                        0.001440       -1   
4                            241                        0.001440       -1   
...                          ...                             ...      ...   
374205                       108                        0.006658       -1   
374206                       108                        0.006955       -1   
374207                       108                        0.006708       -1   
374208                       108                        0.007151       -1   
374209                       108                        0.008125       -1   

        business_density  
0                 364216  
1                 364216  
2                 364216  
3                 364216  
4                 364216  
...                  ...  
374205            364216  
374206            364216  
374207            364216  
374208            364216  
374209            364216  

[261380 rows x 16 columns]

Rows matching low business density condition:
        census_year  block_id  property_id  base_property_id  clue_small_area  \
87             2021        31       102111            102111  Melbourne (CBD)   
117            2021        31       105945            105945  Melbourne (CBD)   
123            2021        31       108968            108968  Melbourne (CBD)   
180            2021        32       102119            102119  Melbourne (CBD)   
264            2021        33       105937            105937  Melbourne (CBD)   
...             ...       ...          ...               ...              ...   
373861         2021        24       110762            110762  Melbourne (CBD)   
373876         2021        24       110762            110762  Melbourne (CBD)   
374007         2021        27       102067            102067  Melbourne (CBD)   
374114         2020       266       109849            109849          Carlton   
374157         2020       270       664627            104468        Parkville   

                                 trading_name  \
87                                     Vacant   
117        Team Building Construction Pty Ltd   
123               Apna Desi Indian Restaurant   
180                    A2M Consulting Pty Ltd   
264                   PP&E Valuations Pty Ltd   
...                                       ...   
373861                      Hearing Australia   
373876                                 Vacant   
374007          Bell Asset Management Limited   
374114  RMIT Building 51 - Frederick Campbell   
374157        National Australia Bank Limited   

                                         business_address  \
87      Shop 1-3, 608-610 Collins Street MELBOURNE VIC...   
117     Unit 5, Ground 601 Little Collins Street MELBO...   
123         Shop 5, 120 Spencer Street MELBOURNE VIC 3000   
180     Suite 2, Level 15 470 Collins Street MELBOURNE...   
264     Suite 504-505, Level 5 443 Little Collins Stre...   
...                                                   ...   
373861  Part Level 5 303 Collins Street MELBOURNE VIC ...   
373876      Level 9 303 Collins Street MELBOURNE VIC 3000   
374007  Part Level 20 101 Collins Street MELBOURNE VIC...   
374114             80-92 Victoria Street CARLTON VIC 3053   
374157  Part Ground NAB Bank Building 143 230 Grattan ...   

        industry_anzsic4_code  \
87                          0   
117                      3019   
123                      4511   
180                      6922   
264                      6720   
...                       ...   
373861                   8512   
373876                      0   
374007                   6419   
374114                   8102   
374157                   6221   

                           industry_anzsic4_description   longitude  \
87                                         Vacant Space  144.954568   
117             Other Residential Building Construction  144.954904   
123                               Cafes and Restaurants  144.954365   
180                      Surveying and Mapping Services  144.958334   
264                                Real Estate Services  144.960097   
...                                                 ...         ...   
373861                      Specialist Medical Services  144.963824   
373876                                     Vacant Space  144.963824   
374007  Other Auxiliary Finance and Investment Services  144.970773   
374114                                 Higher Education  144.964848   
374157                                          Banking  144.961209   

         latitude                     geometry  nearest_pedestrian_count  \
87     -37.818610  POINT (144.95457 -37.81861)                       601   
117    -37.818028   POINT (144.9549 -37.81803)                       601   
123    -37.818184  POINT (144.95436 -37.81818)                       601   
180    -37.817538  POINT (144.95833 -37.81754)                       305   
264    -37.816396    POINT (144.9601 -37.8164)                        57   
...           ...                          ...                       ...   
373861 -37.816628  POINT (144.96382 -37.81663)                       149   
373876 -37.816628  POINT (144.96382 -37.81663)                       149   
374007 -37.814893  POINT (144.97077 -37.81489)                        50   
374114 -37.806564  POINT (144.96485 -37.80656)                       596   
374157 -37.796772  POINT (144.96121 -37.79677)                         3   

        distance_to_nearest_pedestrian  cluster  business_density  
87                            0.000280        0              9966  
117                           0.000946        0              9966  
123                           0.000707        0              9966  
180                           0.002435        0              9966  
264                           0.001121        0              9966  
...                                ...      ...               ...  
373861                        0.001785        0              9966  
373876                        0.001785        0              9966  
374007                        0.001366        0              9966  
374114                        0.002078        0              9966  
374157                        0.003212        0              9966  

[9994 rows x 16 columns]

High-Potential Retail Zones:
                                 trading_name  nearest_pedestrian_count  \
87                                     Vacant                       601   
117        Team Building Construction Pty Ltd                       601   
123               Apna Desi Indian Restaurant                       601   
180                    A2M Consulting Pty Ltd                       305   
264                   PP&E Valuations Pty Ltd                        57   
...                                       ...                       ...   
373836        ACI Worldwide (Pacific) Pty Ltd                        57   
373861                      Hearing Australia                       149   
373876                                 Vacant                       149   
374007          Bell Asset Management Limited                        50   
374114  RMIT Building 51 - Frederick Campbell                       596   

        business_density  
87                  9966  
117                 9966  
123                 9966  
180                 9966  
264                 9966  
...                  ...  
373836              9966  
373861              9966  
373876              9966  
374007              9966  
374114              9966  

[6971 rows x 3 columns]
<Figure size 1000x800 with 0 Axes>
No description has been provided for this image

Interactive Mapping¶

This section creates an interactive map centered on Melbourne using Folium to visualize high-traffic and high-potential retail zones. Business clusters are represented with blue markers, while high-potential zones, identified based on pedestrian traffic and business density, are highlighted with red markers. This interactive map provides a dynamic way to explore and analyze the spatial distribution of business opportunities in Melbourne.

In [94]:
# Initialize map centered on Melbourne
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)

# Add high-traffic zones
for _, row in business_gdf[business_gdf['cluster'] != -1].iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=5,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(m)

# Add high-potential zones
for _, row in high_potential_zones.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=7,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.8
    ).add_to(m)

# Display the map
m
Out[94]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Statistical Analysis¶

This section generates summary statistics for business clusters, calculating metrics such as average business density, average pedestrian count, and the number of businesses in each cluster. It aggregates data by clusters, providing insights into cluster characteristics, and saves the summary to a CSV file for further analysis or reporting. This allows for a more comprehensive understanding of business cluster dynamics.

In [96]:
# Summary statistics for clusters
cluster_summary = business_gdf.groupby('cluster').agg({
    'business_density': 'mean',
    'nearest_pedestrian_count': ['mean', 'max', 'min'],
    'longitude': 'count'  # Number of businesses per cluster
}).reset_index()

cluster_summary.columns = ['Cluster', 'Avg_Business_Density', 'Avg_Pedestrian_Count', 'Max_Pedestrian_Count', 'Min_Pedestrian_Count', 'Num_Businesses']
print("\nCluster Summary Statistics:")
print(cluster_summary)

# Save cluster summary to a CSV file for further analysis
cluster_summary.to_csv('cluster_summary_statistics.csv', index=False)
Cluster Summary Statistics:
   Cluster  Avg_Business_Density  Avg_Pedestrian_Count  Max_Pedestrian_Count  \
0       -1              364216.0            296.478958                  2613   
1        0                9966.0            292.591712                  2613   
2        1                  23.0            148.000000                   148   
3        2                   5.0             47.800000                   115   

   Min_Pedestrian_Count  Num_Businesses  
0                     1          364216  
1                     1            9966  
2                   148              23  
3                     3               5  

Creating an Enhanced Interactive Map of Business Clusters Using Plotly Express¶

This section creates an enhanced interactive map using Plotly Express to visualize business clusters in Melbourne. A random sample of businesses is selected for visualization, ensuring valid geometries. The map uses scatter_mapbox to plot businesses, with clusters differentiated by color and business density represented by marker size. Interactive features, such as hover tooltips displaying cluster details and business information, allow for dynamic exploration of the data. The map is styled using carto-positron and includes adjustments to marker opacity and scaling for better clarity. The final interactive map provides an insightful visualization of business cluster characteristics and their spatial distribution.

In [98]:
import plotly.express as px

# Sample a subset of business_gdf for visualization
sample_size = 5000  
business_sample = business_gdf.sample(n=min(sample_size, len(business_gdf)), random_state=42)

# Ensure valid geometries in the sample
business_sample = business_sample[~business_sample.geometry.is_empty]
business_sample = business_sample[~business_sample.geometry.isna()]

# Create an interactive map with enhanced features
fig = px.scatter_mapbox(
    business_sample,
    lat=business_sample.geometry.y,
    lon=business_sample.geometry.x,
    color='cluster',
    size='business_density',
    hover_name='trading_name',
    hover_data={
        'business_density': True,
        'nearest_pedestrian_count': True,
        'cluster': True,
    },
    title="Enhanced Interactive Map of Business Clusters",
    mapbox_style="carto-positron",
    zoom=13,
    height=800
)

# Adjust marker opacity and size scaling
fig.update_traces(marker=dict(opacity=0.8, sizemode='area', sizeref=2. * max(business_sample['business_density']) / (40.**2), sizemin=4))

# Add legend title and formatting
fig.update_layout(
    legend_title_text="Cluster ID",
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)

# Show the interactive map
fig.show()

Actionable Insights for High-Potential Retail Zones¶

This section analyzes high-potential retail zones to provide actionable insights. For each identified zone, key attributes like trading name, nearest pedestrian count, and business density are summarized, and a recommendation is generated based on high foot traffic and low competition. If no high-potential zones are identified, a message is displayed to indicate the absence of such zones. This step provides valuable strategic guidance for selecting optimal retail locations.

In [100]:
# Generate actionable insights for high-potential zones and display them
if not high_potential_zones.empty:
    insights = high_potential_zones[['trading_name', 'nearest_pedestrian_count', 'business_density']].copy()
    insights['recommendation'] = insights.apply(
        lambda row: "High potential for new retail opportunities based on high pedestrian count and low competition.",
        axis=1
    )
    # Display the actionable insights
    print("\nActionable Insights for High-Potential Zones:")
    print(insights)
else:
    print("No high-potential zones found to generate insights.")
Actionable Insights for High-Potential Zones:
                                 trading_name  nearest_pedestrian_count  \
87                                     Vacant                       601   
117        Team Building Construction Pty Ltd                       601   
123               Apna Desi Indian Restaurant                       601   
180                    A2M Consulting Pty Ltd                       305   
264                   PP&E Valuations Pty Ltd                        57   
...                                       ...                       ...   
373836        ACI Worldwide (Pacific) Pty Ltd                        57   
373861                      Hearing Australia                       149   
373876                                 Vacant                       149   
374007          Bell Asset Management Limited                        50   
374114  RMIT Building 51 - Frederick Campbell                       596   

        business_density                                     recommendation  
87                  9966  High potential for new retail opportunities ba...  
117                 9966  High potential for new retail opportunities ba...  
123                 9966  High potential for new retail opportunities ba...  
180                 9966  High potential for new retail opportunities ba...  
264                 9966  High potential for new retail opportunities ba...  
...                  ...                                                ...  
373836              9966  High potential for new retail opportunities ba...  
373861              9966  High potential for new retail opportunities ba...  
373876              9966  High potential for new retail opportunities ba...  
374007              9966  High potential for new retail opportunities ba...  
374114              9966  High potential for new retail opportunities ba...  

[6971 rows x 4 columns]

Interactive Map for Business Clusters and High-Potential Zones in Melbourne¶

This section enhances the interactive map by visualizing clustered businesses, high-potential zones, and dynamic cluster groups with a comprehensive legend. The map is centered on Melbourne, with both clustered businesses and high-potential zones represented by markers of the same size for consistent visualization. Blue markers for clustered businesses and red markers for high-potential zones. Dynamic clustering is implemented using the Folium MarkerCluster plugin, which groups nearby markers into green and yellow circles to reduce clutter and improve readability. Tooltips provide detailed information for each marker, including trading names, cluster IDs, pedestrian counts, and business densities. A custom HTML legend explains the color coding for clustered businesses, high-potential zones, and dynamically generated cluster groups, ensuring clarity for users. This interactive map offers a clear and intuitive representation of Melbourne's business clusters and high-potential retail zones, helping to identify areas of interest effectively.

In [163]:
from folium.plugins import MarkerCluster

# Initialize the map centered on Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)

# Add clustered markers for all businesses
marker_cluster = MarkerCluster().add_to(melbourne_map)

# Check for businesses in clusters
clustered_businesses = business_gdf[business_gdf['cluster'] != -1]
print(f"Number of clustered businesses: {clustered_businesses.shape[0]}")

# Add blue markers for clustered businesses
for _, row in clustered_businesses.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=5,  # Same size for blue markers
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        tooltip=f"Trading Name: {row['trading_name']}<br>Cluster: {row['cluster']}<br>Business Density: {row['business_density']}"
    ).add_to(marker_cluster)

# Highlight high-potential zones
for _, row in high_potential_zones.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=5,  # Same size for red markers
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.8,
        tooltip=f"High-Potential Zone:<br>Trading Name: {row['trading_name']}<br>Pedestrian Count: {row['nearest_pedestrian_count']}<br>Business Density: {row['business_density']}"
    ).add_to(melbourne_map)

# Add a legend (customized with HTML)
legend_html = """
<div style="position: fixed;
            bottom: 50px; left: 50px; width: 350px; height: 120px;
            background-color: white; border:2px solid grey; z-index:1000; font-size:14px;">
&nbsp;<b>Legend</b><br>
&nbsp;<i style="background:blue; color:white; padding:5px;">&nbsp;&nbsp;&nbsp;</i> Clustered Businesses<br>
&nbsp;<i style="background:red; color:white; padding:5px;">&nbsp;&nbsp;&nbsp;</i> High-Potential Zones<br>
&nbsp;<i style="background:green; color:white; padding:5px;">&nbsp;&nbsp;&nbsp;</i> Small Cluster Group<br>
&nbsp;<i style="background:yellow; color:black; padding:5px;">&nbsp;&nbsp;&nbsp;</i> Large Cluster Group<br>
</div>
"""
melbourne_map.get_root().html.add_child(folium.Element(legend_html))

# Display the map
melbourne_map
Number of clustered businesses: 9994
Out[163]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Key Insights and Findings from the Analysis¶

Factors Influencing Retail Store Optimization:

Pedestrian Traffic Patterns:

  • Peak Pedestrian Activity: The analysis identified specific locations with consistently high foot traffic, such as major intersections, transportation hubs, and shopping districts.
  • Dynamic Pedestrian Flow: Hourly pedestrian counts reveal peak activity during mid-mornings and early evenings, making these times ideal for retail engagement strategies.

Business Density:

  • Clustered Markets: High-density clusters indicate saturated retail markets where businesses face significant competition.
  • Underserved Areas: Low-density regions with substantial pedestrian counts represent opportunities for new retail outlets to address unmet demand.

High-Potential Zones:

  • Strategic Locations: Using pedestrian counts and business density data, several zones were identified as ideal for new retail opportunities. These zones balance high visibility (foot traffic) with limited competition (low business density).
  • Economic Potential: High-potential zones align with strategic urban areas, making them attractive for investment and long-term growth.

Recommendations for Strategic Retail Planning¶

Invest in High-Potential Zones:

  • Focus on zones with high pedestrian counts and low business density to establish retail outlets in locations with untapped demand and high visibility.

Data-Driven Marketing:

  • Utilize hourly pedestrian traffic patterns to schedule promotions, events, and customer engagement activities during peak footfall times for maximum impact.

Enhance Retail Presence in Existing Clusters:

  • For businesses in dense clusters, explore partnerships and complementary offerings to maximize customer retention and differentiate from competitors.

Utilize Geospatial Data for Expansion:

  • Regularly update pedestrian and business density data to adapt to changing urban dynamics and prioritize areas of growth or high activity for future expansions.

Improve Accessibility and Infrastructure:

  • Advocate for urban enhancements in high-potential zones, such as better signage, pedestrian-friendly streets, and improved public transport connectivity, to attract more foot traffic.

Conclusion¶

This use case demonstrates the power of combining pedestrian traffic data and business density analysis to inform strategic retail decision-making. By leveraging these insights, businesses can:

  • Identify high-potential zones with strong foot traffic and limited competition.
  • Optimize marketing strategies to align with pedestrian flow patterns.
  • Enhance urban retail planning to align with customer behavior and demand.

These recommendations aim to create smarter retail strategies that enhance visibility, competitiveness, and profitability, contributing to Melbourne's economic vitality and urban growth.

References¶

[1] GeoPandas: Python tools for geographic data GeoPandas Documentation. https://geopandas.org/

[2] Folium: Interactive Maps for Python Folium Documentation. https://python-visualization.github.io/folium/

[3] DBSCAN (Density-Based Spatial Clustering of Applications with Noise) Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226–231.

[4] Scipy KDTree Scipy Documentation. https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html

[5] Matplotlib: Python Plotting Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95. https://matplotlib.org/

[6] Plotly Express for Interactive Maps Plotly Technologies Inc. (2015). https://plotly.com/python/plotly-express/

[7] Pandas for Data Manipulation McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 51-56. https://pandas.pydata.org/.

[8] Scikit-learn for Machine Learning Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. https://scikit-learn.org/

[9] Urban Retail Planning and Pedestrian Traffic Studies Jacobs, J. (1961). The Death and Life of Great American Cities. New York: Random House.

[10] Impact of Pedestrian Flow on Retail Success Whyte, W. H. (1980). The Social Life of Small Urban Spaces. New York: Project for Public Spaces.